AITopics | violence detection

Collaborating Authors

violence detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

1f471322127d6347e5ae09a14b1e5cf7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 05:02:05 GMT

dataset, information, representation, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Chongqing Province > Chongqing (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Add feedback

Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling

Jung, Seoik, Song, Taekyung, Lee, Yangro, Lee, Sungjun

arXiv.org Artificial IntelligenceNov-17-2025

Abstract--This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divides videos into 1-2 second clips and applies Large Language Model (LLM)-based auto-caption labeling to construct fine-grained datasets. Each short clip fully utilizes all frames to preserve temporal continuity, enabling precise recognition of rapid violent events. Experiments demonstrate that the proposed method achieves 95.25% accuracy on RWF-2000 and significantly improves performance on long videos (UCF-Crime: 83.25%), confirming its strong generalization and real-time applicability in intelligent surveillance systems. Recently, video-based violence and abnormal behavior detection has been gaining attention as an essential core technology in fields such as public safety, smart cities, and intelligent surveillance [1].

large language model, machine learning, real time system, (18 more...)

arXiv.org Artificial Intelligence

2511.10866

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Architecture > Real Time Systems (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Federated Learning for Video Violence Detection: Complementary Roles of Lightweight CNNs and Vision-Language Models for Energy-Efficient Use

Thuau, Sébastien, Haidar, Siba, Chelouah, Rachid

arXiv.org Artificial IntelligenceNov-11-2025

Deep learning-based video surveillance increasingly demands privacy-preserving architectures with low computational and environmental overhead. Federated learning preserves privacy but deploying large vision-language models (VLMs) introduces major energy and sustainability challenges. We compare three strategies for federated violence detection under realistic non-IID splits on the RWF-2000 and RLVS datasets: zero-shot inference with pretrained VLMs, LoRA-based fine-tuning of LLaVA-NeXT-Video-7B, and personalized federated learning of a 65.8M-parameter 3D CNN. All methods exceed 90% accuracy in binary violence detection. The 3D CNN achieves superior calibration (ROC AUC 92.59%) at roughly half the energy cost (240 Wh vs. 570 Wh) of federated LoRA, while VLMs provide richer multimodal reasoning. Hierarchical category grouping (based on semantic similarity and class exclusion) boosts VLM multiclass accuracy from 65.31% to 81% on the UCF-Crime dataset. To our knowledge, this is the first comparative simulation study of LoRA-tuned VLMs and personalized CNNs for federated violence detection, with explicit energy and CO2e quantification. Our results inform hybrid deployment strategies that default to efficient CNNs for routine inference and selectively engage VLMs for complex contextual reasoning.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.07171

Country: Europe > France (0.15)

Genre: Research Report > New Finding (0.34)

Industry:

Energy (0.47)
Law (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Frugal Federated Learning for Violence Detection: A Comparison of LoRA-Tuned VLMs and Personalized CNNs

Thuau, Sébastien, Haidar, Siba, Bajracharya, Ayush, Chelouah, Rachid

arXiv.org Artificial IntelligenceOct-21-2025

We examine frugal federated learning approaches to violence detection by comparing two complementary strategies: (i) zero-shot and federated fine-tuning of vision-language models (VLMs), and (ii) personalized training of a compact 3D convolutional neural network (CNN3D). Using LLaVA-7B and a 65.8M parameter CNN3D as representative cases, we evaluate accuracy, calibration, and energy usage under realistic non-IID settings. Both approaches exceed 90% accuracy. CNN3D slightly outperforms Low-Rank Adaptation(LoRA)-tuned VLMs in ROC AUC and log loss, while using less energy. VLMs remain favorable for contextual reasoning and multimodal inference. We quantify energy and CO$_2$ emissions across training and inference, and analyze sustainability trade-offs for deployment. To our knowledge, this is the first comparative study of LoRA-tuned vision-language models and personalized CNNs for federated violence detection, with an emphasis on energy efficiency and environmental metrics. These findings support a hybrid model: lightweight CNNs for routine classification, with selective VLM activation for complex or descriptive scenarios. The resulting framework offers a reproducible baseline for responsible, resource-aware AI in video surveillance, with extensions toward real-time, multimodal, and lifecycle-aware systems.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.17651

Country: Europe > France > Île-de-France (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

1f471322127d6347e5ae09a14b1e5cf7-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 20:30:33 GMT

dataset, information, representation, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Chongqing Province > Chongqing (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Add feedback

Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection

Senadeera, Damith Chamalke, Yang, Xiaoyun, Li, Shibo, Awais, Muhammad, Kollias, Dimitrios, Slabaugh, Gregory

arXiv.org Artificial IntelligenceSep-29-2025

The rapid proliferation of surveillance cameras has increased the demand for automated violence detection. While CNNs and Transformers have shown success in extracting spatio-temporal features, they struggle with long-term dependencies and computational efficiency. W e propose Dual Branch VideoMamba with Gated Class T oken Fusion (GCTF), an efficient architecture combining a dual-branch design and a state-space model (SSM) backbone where one branch captures spatial features, while the other focuses on temporal dynamics. The model performs continuous fusion via a gating mechanism between the branches to enhance the model's ability to detect violent activities even in challenging surveillance scenarios. W e also present a new benchmark by merging RWF-2000, RLVS, SURV and VioPeru datasets in video violence detection, ensuring strict separation between training and testing sets. Experimental results demonstrate that our model achieves state-of-the-art performance on this benchmark and also on DVD dataset which is another novel dataset on video violence detection, offering an optimal balance between accuracy and computational efficiency, demonstrating the promise of SSMs for scalable, near real-time surveillance violence detection.

artificial intelligence, machine learning, spatial reasoning, (14 more...)

arXiv.org Artificial Intelligence

2506.03162

Country: Europe (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Commercial Services & Supplies > Security & Alarm Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.66)

Add feedback

Intelligent Image Sensing for Crime Analysis: A ML Approach towards Enhanced Violence Detection and Investigation

Dutta, Aritra, Boral, Pushpita, Suseela, G

arXiv.org Artificial IntelligenceJun-18-2025

The increasing global crime rate, coupled with substantial human and property losses, highlights the limitations of traditional surveillance methods in promptly detecting diverse and unexpected acts of violence. Addressing this pressing need for automatic violence detection, we leverage Machine Learning to detect and categorize violent events in video streams. This paper introduces a comprehensive framework for violence detection and classification, employing Supervised Learning for both binary and multi-class violence classification. The detection model relies on 3D Convolutional Neural Networks, while the classification model utilizes the separable convolutional 3D model for feature extraction and bidirectional LSTM for temporal processing. Training is conducted on a diverse customized datasets with frame-level annotations, incorporating videos from surveillance cameras, human recordings, hockey fight, sohas and wvd dataset across various platforms. Additionally, a camera module integrated with raspberry pi is used to capture live video feed, which is sent to the ML model for processing. Thus, demonstrating improved performance in terms of computational resource efficiency and accuracy.

artificial intelligence, machine learning, video, (14 more...)

arXiv.org Artificial Intelligence

2506.1391

Genre: Research Report > New Finding (0.47)

Industry:

Commercial Services & Supplies > Security & Alarm Services (0.89)
Leisure & Entertainment > Sports > Hockey (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cross-Platform Violence Detection on Social Media: A Dataset and Analysis

Chen, Celia, Beland, Scotty, Burghardt, Ingo, Byczek, Jill, Conway, William J., Cotugno, Eric, Davre, Sadaf, Fletcher, Megan, Gnanasekaran, Rajesh Kumar, Hamilton, Kristin, Harbert, Marilyn, Heustis, Jordan, Jha, Tanaya, Klein, Emily, Kramer, Hayden, Leitch, Alex, Perkins, Jessica, Sherman, Casi, Sterrn, Celia, Stevens, Logan, Zarrella, Rebecca, Golbeck, Jennifer

arXiv.org Artificial IntelligenceJun-5-2025

Violent threats remain a significant problem across social media platforms. Useful, high-quality data facilitates research into the understanding and detection of malicious content, including violence. In this paper, we introduce a cross-platform dataset of 30,000 posts hand-coded for violent threats and sub-types of violence, including political and sexual violence. To evaluate the signal present in this dataset, we perform a machine learning analysis with an existing dataset of violent comments from YouTube. We find that, despite originating from different platforms and using different coding criteria, we achieve high classification accuracy both by training on one dataset and testing on the other, and in a merged dataset condition. These results have implications for content-classification strategies and for understanding violent content across social media.

artificial intelligence, machine learning, violence, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3717867.3717877

2506.03312

Country: North America > United States > Maryland (0.28)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.48)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Exploring Personalized Federated Learning Architectures for Violence Detection in Surveillance Videos

Kassir, Mohammad, Haidar, Siba, Yaacoub, Antoun

arXiv.org Artificial IntelligenceApr-1-2025

The challenge of detecting violent incidents in urban surveillance systems is compounded by the voluminous and diverse nature of video data. This paper presents a targeted approach using Personalized Federated Learning (PFL) to address these issues, specifically employing the Federated Learning with Personalization Layers method within the Flower framework. Our methodology adapts learning models to the unique data characteristics of each surveillance node, effectively managing the heterogeneous and non-IID nature of surveillance video data. Through rigorous experiments conducted on balanced and imbalanced datasets, our PFL models demonstrated enhanced accuracy and efficiency, achieving up to 99.3% accuracy. This study underscores the potential of PFL to significantly improve the scalability and effectiveness of surveillance systems, offering a robust, privacy-preserving solution for violence detection in complex urban environments.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2504.00857

Country:

Europe > France > Île-de-France > Paris > Paris (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT

Jiang, Wen-Dong, Chang, Chih-Yung, Roy, Diptendu Sinha

arXiv.org Artificial IntelligenceJan-7-2025

Recently, violence detection systems developed using unified multimodal models have achieved significant success and attracted widespread attention. However, most of these systems face two critical challenges: the lack of interpretability as black-box models and limited functionality, offering only classification or retrieval capabilities. To address these challenges, this paper proposes a novel interpretable violence detection system, termed the Three-in-One (TIO) System. The TIO system integrates knowledge graphs (KG) and graph attention networks (GAT) to provide three core functionalities: detection, retrieval, and explanation. Specifically, the system processes each video frame along with text descriptions generated by a large language model (LLM) for videos containing potential violent behavior. It employs ImageBind to generate high-dimensional embeddings for constructing a knowledge graph, uses GAT for reasoning, and applies lightweight time series modules to extract video embedding features. The final step connects a classifier and retriever for multi-functional outputs. The interpretability of KG enables the system to verify the reasoning process behind each output. Additionally, the paper introduces several lightweight methods to reduce the resource consumption of the TIO system and enhance its efficiency. Extensive experiments conducted on the XD-Violence and UCF-Crime datasets validate the effectiveness of the proposed system. A case study further reveals an intriguing phenomenon: as the number of bystanders increases, the occurrence of violent behavior tends to decrease.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.06224

Country: Asia > India (0.28)

Genre: Research Report (1.00)

Industry:

Transportation (0.66)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
(3 more...)

Add feedback